Markov Blanket Discovery in Positive-Unlabelled and Semi-supervised Data
نویسندگان
چکیده
The importance of Markov blanket discovery algorithms is twofold: as the main building block in constraint-based structure learning of Bayesian network algorithms and as a technique to derive the optimal set of features in filter feature selection approaches. Equally, learning from partially labelled data is a crucial and demanding area of machine learning, and extending techniques from fully to partially supervised scenarios is a challenging problem. While there are many different algorithms to derive the Markov blanket of fully supervised nodes, the partially-labelled problem is far more challenging, and there is a lack of principled approaches in the literature. Our work derives a generalization of the conditional tests of independence for partially labelled binary target variables, which can handle the two main partially labelled scenarios: positive-unlabelled and semi-supervised. The result is a significantly deeper understanding of how to control false negative errors in Markov Blanket discovery procedures and how unlabelled data can help.
منابع مشابه
BASSUM: A Bayesian semi-supervised method for classification feature selection
Feature selection is an important preprocessing step for building efficient, generalizable and interpretable classifiers on high dimensional data sets. Given the assumption on the sufficient labelled samples, the Markov Blanket provides a complete and sound solution to the selection of optimal features, by exploring the conditional independence relationships among the features. In real-world ap...
متن کاملApplication of Data Mining Using Bayesian Belief Network To Classify Quality of Web Services
In this paper, we employed Naïve Bayes, Augmented Naïve Bayes, Tree Augmented Naïve Bayes, Sons & Spouses, Markov Blanket, Augmented Markov Blanket, Semi Supervised and Bayesian network techniques to rank web services. The Bayesian Network is demonstrated on a dataset taken from literature. The dataset consists of 364 web services whose quality is described by 9 attributes. Here, the attributes...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملHITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection
UNLABELLED We introduce a novel, sound, sample-efficient, and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm works by inducing the Markov Blanket of the variable to be classified or predicted. A wide variety of biomedical tasks with different characteristics were used for an empirical evaluation. Namely, (i) bioactivity...
متن کاملBook Review: Computational Methods of Feature Selection
Feature selection selects a subset of relevant features, and also removes irrelevant and redundant features from the data to build robust learning models. Feature selection is very important, not only because of the curse of dimensionality, but also due to emerging data complexities and quantities faced by multiple disciplines, such as machine learning, data mining, pattern recognition, statist...
متن کامل